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1. INTRODUCTION 

Object detection is a computer technology related to computer vision and image processing that deals 
with a combination of object classification and object positioning. The advent of modern advances in deep 
learning [1-3] has led to significant advances in object detection. Most recent research focused on designing a 
complex network for object detection based on neural network to enhance accuracy, such as single shot detector 
(SSD) [4] and faster R-CNN [5]. 

Many researchers are devoted to developing a computer technology and deep learning in the modern 
life for it outstanding advantages. Convolutional neural networks (CNNs) applied on the dataset of image data 
(especially lung X-ray) [3] for classification of pneumonia disease and the result was obtained an accuracy rate 
of 97%. The AlexNet’s deep convolutional neural network used as a pre-trained neural network with 
1000 categories for image classification [6] to detect and geotag advertisement billboard in real-time condition, 
and experimental results achieved 92.7% training accuracy for advertisement billboard detection. By using 
convolutional neural networks, Z. Rustam, et al., [7] proposed the method to assist doctors in providing the 
appropriate beliefs and predictions to patients, the results showed the capability of CNNs method to accurately 
identify the patient's X-ray test images. According to the results published in [8], the CNNs model uses 
64x64 input shape, 0.0001 learning rate, 3x3 filter size, epoch 100 count, data training 160, and data testing 
40, the accuracy level of training and testing in classification of golek puppet image attained 100% accuracy. 
This is an ideal result that demonstrates the effectiveness of using CNNs method in object classification. An 


Journal homepage: http://journal.uad.ac.id/index.php/TELKOMNIKA 


TELKOMNIKA Telecommun Comput El Control O 245 


application of transfer learning by using CNNs method based on the inception-v3 architectural model [9] for 
early detection of terry’s nail. The accuracy obtained with training data 90%, precision and memory, each of 
which is worth 95.24%, 100%, and 90.91%. Specially, we introduce you only look once (YOLO), a unified 
model for object detection. The YOLO model [10] is simple to construct and can be trained directly on full 
images. Unlike classifier-based approaches, fast YOLO is the fastest general-purpose object detector in the 
literature and YOLO pushes the state-of-the-art in real-time object detection, to do so YOLO generalizes well 
to new domains making it ideal, fast, robust object detection for applications that rely on. However, all of the 
algorithms require a large amount of resources of the system, and to put them on limited hardware devices 
needs to be streamlined and compiled into limited hardware. 

Related to ensure the maritime safety, the main objective constitutes the following two tasks as follow: 
the first is ensuring the safety of life and property at sea from the geographic and operational hazards 
(underwater obstacles, collision, harms and damages caused by the unfavorable weather conditions) and the 
second is ensuring the safety of ship control throughout the journey by the sailer, if during an emergency 
situation, a navigational officer is not capable of handling that situation, it can lead to maritime collision. For 
the first task, there are many studies to improve, upgrade current systems that have shortcomings in regard to 
availability, integrity, monitoring and system life expectancy as the global navigation satellite system [11] and 
the regional satellite augmentation system for maritime applications [12], or the design of satellite constellation 
for Indonesian maritime surveillance using the AIS data acquisition by LAPAN-A2 and LAPAN-A3 
satellites [13] with the eight satellites in an equatorial orbit for near real-time AIS monitoring in Indonesia and 
the other equatorial region make a better global maritime awareness and ensuring the maritime safety. The 
second task, to design and manufacture systems serving ships to ensure safety in ship operation process by 
using new computer technonogies as neural network, fuzzy-neural, or genetic algorithm. 

In this paper, we aimed to apply the modified SSDLite MobileNetV2 bounded CNN algorithm to 
bridge navigational watch & alarm system (BNWAS), extensive experiments showed that the proposed method 
can achieve the state-of-the-art results compared with the best current method based on hand crafted 
features [14] and three other related CNN based methods [15-17] and our previous work [18] for image 
analysis. Moreover, we have validated the rationality and robustness of the proposed model with more 
supplementary results. The inverted residual bottleneck layers allow a particularly memory-efficient 
implementation which is very important for mobile applications. A standard efficient implementation of 
inference that were used for instance Tensor Flow [19] or Caffe [20] built a directed acyclic compute hyper 
graph G. With a small hardware system, we used the SSD Lite MobileNetV2 structure because it was fast and 
accurate. Not only were the requirements for image processing, object detection and classification met, the 
system also abode by IMO [21, 22], IEC [23] and [24, 25] regulations which could be tested and directly 
operated on board. We carefully designed a new CNN based method for detecting various typical 
image-processing operations, the main contributions of this paper are given as follow: 

— We first converted the input image into residuals to suppress the influence of image contents, and then used 
a convolutional layer to increase the channel number. 

— We employed six similar layer groups to obtain the high-level features of the input image. 

— Finally, we applied the resulting features into the full connect layer for classification of the system, we 
proposed a solution to always maintain the boundary of the total memory capacity in the following robust 
bound and applied on the BNWAS. 

The rest of the paper is organized as follows; section 2 shows some related works and proposed the 
method reducing memory while ensuring image quality for object detection and section 3 describes the 
structure of the proposed BNWAS based on convolutional neural networks, presents the experimental results 
and discussions. Finally, the concluding remarks are given in section 4. 


2. CNNs BASED SSD LITE-MOBILE NET METHOD FOR OBJECT DETECTION WITH 
LIMITED-MEMORY 

CNN models are highly accurate, but they all have a common drawback that is they are not suitable 
for mobile applications or embedded systems with low power computing. In literature review, the authors 
in [26] introduce resource-frugal quantized convolutional neural networks to reduce their size without 
adversely affecting the classification capability for segmenting hyperspectral satellite images, especially 
focusing on the memory savings of quantized CNNs. Moreover, an approach using object class clustering to 
lower bit precision beyond quantization limits proposed by Prateeth Nayak, et al. [27] used 3 schemes, which 
are uniform-ASY MM, uniform-SY MM, and power-of-2. The result is all of quantization scheme achieved near 
original model accuracy for every tested model. 

If you want to develop these models for real-time applications, you need an extremely powerful 
configuration (GPU/CPU) for embedded systems (raspberry Pi, nano PC) or applications running on 
smartphones. Therefore, we need to build a model like SSDLite-MobileNet hybrid. The main factor will help 
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SSD Lite-Mobile Net achieve high accuracy while low computation time lies in the hybrid structure from SSD 
and MobileNet structure. SSD (single shot multi box detector) is an object detector (Figure 1) that performs 
two main steps: extract feature maps of features (feature maps) and apply convolution filters (convolution 
filters) to detect objects. 
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Figure 1. Structure of single shot multi box detector used to detect a navigational officer 


The loss function [3]: 


L(x, ¢,4,9) = = (Leong (%,€) taing) (1) 


The loss function consists of two terms: Leons and Liocwhere N is the matched default boxes. Matched default 
boxes: 


Ligc(%, l, g) = DiePos Limciece wit xE smooth,, (l;” E Gj) (2) 


where gi = (g7 — di*)/d”.g? = (g9? —a?)/dt, 0Y =log() and gt = log(); Lic is the 


localization loss which is the smooth L loss between the predicted box and the ground-truth box parameters. 
This loss function is similar to the one in Faster R-CNN. Leong is the confidence loss which is the softmax loss 
over multiple classes confidences (c). (a is set to 1 by cross validation). 


“cong (%, c)=- Epos xi log(é?) = Dienes log(é?) (3) 


p 

A = i xi = {1,0} is an indicator for matching i-th default box to the j-th ground truth box 
p i 

of category P. If m default maps are used for prediction, we suggest the form the scale of the default boxes 


for each feature map is computed as: 


where: ĉ 


ER se (k1), k e [l,m] 


mın 


- (4) 
Based on [24], we set parameter Smin 1s 0.2 and Smax 1s 0.9 (Sx is 0.1, 0.2, 0.375, 0.55, 0.725. 0.9 means 30, 
60, 112.5, 165, 217.5, 270 pixels input image (300x300)). 

The structure contains a completely original convolution layer with 32 filters and 19 layers of 
bottleneck. MobileNetV2 detailed structure is described by M. Sandler [25]. The inverted residual bottleneck 
layers allow the system to have a particularly efficient memory, which is very important for applications. A 
standard efficient implementation of inference is used in Tensor Flow [19] or Caffe [20]. The computation is 
scheduled to minimize the total number of tensors that needs to be stored in memory. In most general cases, it 
searches over all plausible computation orders Ł(G) and picks the minimum one. 


M(G) = min max [Maer lAl] + size (n;) (5) 
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where: R(i, m, G) is the list of intermediate tensors that are connected to any of zi. . . zn nodes, |A| represents the size 
of the tensor A, and size(zi) is the total amount of memory needed for internal storage during operation 1. For graphs 
that have only trivial parallel structure (such as residual connection), there is only one nontrivial feasible 
computation order, and thus the total amount and a bound on the memory M(G) needed for inference on 
compute graph G can be simplified: 


M(G) = max |YacoPinplAl + LacoPouelB| + OP || (6) 
opEG 


Following [25], the amount of memory is simply the maximum total size of combined inputs and 
outputs across all operations. It means we recognize that if we treat a bottleneck residual block as a single 
operation (and treat inner convolution as a disposable tensor), the total amount of memory would be dominated 
by the size of bottleneck tensors, rather than the size of tensors that are internal to bottleneck (and much larger). 
In a Tensor Flow graph, each node has zero or more inputs and zero or more outputs, and represents the 
instantiation of an operation. Values that flow along normal edges in the graph (from outputs to inputs) are 
tensors, arbitrary dimensionality arrays where the underlying element type is specified or inferred at 
graph-construction time. For small applications, reducing memory while ensuring image quality is great. 
However, when we abuse this, it can easily lead to instability in image processing, such as reducing image 
quality, which relates to the marginal limit of total memory capacity. In this paper, we proposed a solution to 
always maintain the boundary of the total memory capacity in the following robust bound of OP as (7) as 
follows: 


M(G) = max |Z acopinplAl + LacopouelBl + Iloplle| (7) 
Similar with 
MG) = max |ZacopinplAl + Lacopousll| + lop (8) 


Then, for hybrid SSD and MobileNetV2, we replaced all regular convolutions with separable 
convolutions in the SSD network's predictive classes [2] to reduce the number of parameters and help the model 
decrease the amount of total memory capacity as showed in (8) but still maintain the boundary of computing 
steps. In particular, the output is labeled with the object and the confidence level is in percentage terms. In the 
experiments of this paper, the improved SSD-Mobile Net V2 method also showed higher efficiency than the 
method of [25] especially when applied to the BNWAS. 


3. APPLYING CNNs TO DESIGN THE BRIDGE NAVIGATIONAL WATCH AND ALARM 
SYSTEM 
3.1. BNWAS design based on regulations of IMO MSC. 128 (75) 

In recent years, it is known that ships usually perform under the complexity and vulnerability of 
environment, so that the challenge of ship development remains an problem of significant advancements from 
researchers. They have been paid attention to study of ship [27-30] to meet the IMO standards. Recently, the 
authors [18] have studied and applied the modified SSDLite MobileNetV2 hybrid algorithm to BNWAS by 
using the hardware based on raspberry Pi-3 to meet the requirements of IMO MSC. 128 (75) and SOLAS 
Chapter V, Reg.19 MSC. 282 (86) [23] revised on June 5, 2009 [20] valid for ships classified by size: 

— July 2011: new vessels in excess of 150 tonnes. 

— July 2011: all passenger vessels. 

— July 2012: all vessels in excess of 3,000 tonnes. 

— July 2013: all vessels between 500 and 3,000 tonnes. 
— July 2014: all vessels between 150 and 500 tonnes. 

BNWAS is a monitoring and Alarm system which notifies other officers or captains if the officer on 
watch (OOW) does not respond or he/she is incapable of performing the watch duties efficiently which can 
lead to maritime accidents. The system monitors the awareness of the officer of the watch (OOW) and 
automatically alerts the Master or another qualified OOW if for any reason the OOW becomes incapable of 
performing duties. This is achieved through a mix of alarms and indications which alert backup OOWS as well 
as the Master. BNWAS warnings are given in the case of incapacity of the watchkeeping officer due to 
accidents, sickness or in the event of a security breach, e.g. piracy and/or hijacking. Unless decided by the 
Master only, the BNWAS shall remain operational at all times. 
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Outputs of the system should be available for connection of additional bridge visual indications, 
audible alarms and remote audible alarms as in [9]. The applied to the actual system design in Figure 2 (a) and 
designing diagram is shown in Figure 2 (b). The connected computer works in tandem with raspberry P1-3 
(plays the role of the central processing board on Figure 2 (b) to collect input and output data of the testing 
process. Hardware is designed to perform alarm functions. 

To compare the effectiveness of the solution with other applications based on hardware and practical 
conditions in the bridge of the Saigon Millennium Ship, we deployed four solutions to get results. In this work, 
we focused on two factors, including processing speed and output reliability to apply object detectors on the 
designed system by using the modified SSDLite MobileNetV2 bounded CNN algorithm. 
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Figure 2. The designed BNWAS GTS.V1 system tested on HCM City University of Transport; 
(a) BNWAS-GTS.V1 system tested on HCM City University of Transport, and (b) Structure of designed BNWAS 
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3.2. Testing the designed BNWAS on Saigon Millennium Vessel in Saigon River 
The image has been recorded from Saigon Millennium ship at Son Hai Shipyard, Ho Chi Minh City, 

Vietnam. This image is captured through the logitech C270 camera and processed by hybrid network-based 

object identification algorithms SSD-Mobile Net V2. The output is the processed image extracting the detected 

object frame and the reliability calculated as a percentage. With the technique used in this paper, the system 

can identify many officers in the bridge and the maximum number of accesses to the detection frame is 20 

people at a time. When identifying officers in the bridge, the system allows customized functions via the touch 

screen or push-button on the bridge. Testing the designed BNWAS on Saigon Millennium Vessel in Saigon 

River as belows: 

— Case 1: if the system determines that there is no officer in the bridge, a timer will be turned on and the 
countdown time will wait for the officer to appear. During the active timer period, the function of switch 
modes and countdown timer are disabled. If during the countdown, there is an officer in the bridge 
(no physical impact is needed on the system), the timer is reset and the system returns to its normal state, 
officers can operate and use the system function keys. 

— Case 2: if no officer returns and the timer has counted to zero (timeout), a flash warning signal will be 
activated in the bridge; this stage is called the primary alarm stage. This signal can be seen anywhere in the 
bridge and in accordance with IMO standards. On the display screen, the alarm level will appear, and all 
system parameters will be saved to the history file, then a next timer is started to move to the next alarm 
stage. Subsequent alarm tests are tested and the final results are consistent with IMO requirements. Not 
only did the system recognize the officer presence in the bridge, it also analyzed the officers' actions and 
issued warnings when they found officers standing still for too long or sleeping while on duty. In 
experiment, the test detected an officer who sat in silence for too long or showed signs of drowsiness as in 
Figure 3. 

The test was recorded when we asked an officer to sit silently on the driver's seat (at least 20 seconds) 
to see whether the officer stands still for too long or has a drowsiness. At the same time, an underground 
running counter will analyze the relative position of the officer and give a relative error. Based on the results 
of each frame analysis, after 20 seconds, if the relative position error does not exceed 10%, the primary alarm 
is set and the next alarm timer will start counting down. 
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Figure 3. Testing the designed BNWAS on Saigon Millennium Vessel in Saigon River; 
(a) testing no alarm stage, and (b) testing alarm stages 


3.2. Summary experimental results 

Highly configurable models running on TITAN X GPUs produced processing speeds between 17 and 
37 frames per second. However, when experimenting on COCO data sets and mAP calculations on all object 
classes, the results only reached 21-28%. Experimental results on processing speed on system were tested 
directly on the bridge with normal working conditions and the results were shown high performance from 
76-97% as in Table 1. 


Table 1. Testing performance results of 4 models in experimental 


Medel aan Test on GPU TITAN X Test on Raspberry Pi 3B+ 
Speed (ms) COCO (mAP) Speed (FPS) _ Real time on bridge (mAP) 
ssd_mobilenet vl coco 30 21 1.05 76 
ssd_mobilenet v2 coco 31 22 0.83 94 
ssdlite mobilenet_v2 coco 27 22 1.08 86 
faster renn inception v2 coco 58 28 0.08 97 


This impressive result is achieved when installing the camera in the bridge in a convenient position 
while the hardware is a mobile device with only ARM CPU and no integrated GPU. The highest processing 
speed is only approximately 1 FPS. The discuss of the experimental results focus more detail in Table 1. The 
result showed that 4 models tested on our hardware (raspberry Pi 3B+) using our method better than GPU 
TITAN X hardware (difference hardware) about speed (ms) and mAP. So that, the FPS speed of the test 
methods is indicated in Figure 4 and this is a good response rate for a monitoring system. 

The output reliability is highest when tested with the faster RCNN detector, however with 0.08FPS 
(about 12.5 seconds to process a frame) it is not possible to meet on a monitoring system. Object detectors 
based on the SSD_MobileNet structure (in brown color) produce highly reliable results and meet processing 
speed requirements. Meanwhile, the result of SSD MobileNetV1 (yellow) and SSD_ MobileNetV2 (green) sets 
are almost equivalent, but the load time of the model is slow due to large capacity and actual output. There are 
still certain deviations. Thus, the improved SSDLite MobileNetV2 solution gives good results relating to 
quality, processing speed, fast model load time (stable running on raspberry PI-3) and has higher accuracy than 
the other solutions. 
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Figure 4. The FPS speed of the test methods; (a) compare processing speed of object detectors on BNWAS 
hardware and (b) compare the output reliability of object detectors on BNWAS hardware 
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4. CONCLUSION 

In this paper, we studied and applied the modified SSDLite MobileNetV2 bounded CNN algorithm 
to BNWAS-GTS.V1. The hardware was designed based on raspberry P1-3, an embedded single board computer 
with CPU smartphone level, limited RAM without CUDA GPU. Experimental results on processing speed on 
BNWAS-GTS.V1 were tested directly on the bridge with normal working conditions. This impressive result 
was achieved when installing the camera in the bridge in a convenient position while the hardware used a 
mobile device. The improved SSD-Mobile Net V2 based on bounded CNN algorithm also showed higher 
efficiency especially when applied to the BNWAS. 
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